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Abstract 



To represent the evolution of nucleic acid and protein sequence, 
we express the parallel and Eigen models for molecular evolution in 
terms of a functional integral representation with an /i-letter alpha- 
bet, lifting the two-state, purine/pyrimidine assumption often made in 
quasi-species theory. For arbitrary h and a general mutation scheme, 
we obtain the solution of this model in terms of a maximum prin- 
ciple. Euler's theorem for homogeneous functions is used to derive 
this 'thermodynamic' formulation of evolution. The general result for 
the parallel model reduces to known results for the purine/pyrimidine 
h = 2 alphabet and the nucleic acid h = 4 alphabet for the Kimura 
3 ST mutation scheme. Examples are presented for the h = A and 
/i = 20 cases. We derive the maximum principle for the Eigen model 
for general h. The general result for the Eigen model reduces to a 
known result for h = 2. Examples arc presented for the nucleic acid 
h = 4 and the amino acid ft. = 20 alphabet. An error catastrophe 
phase transition occurs in these models, and the order of the phase 
transition changes from second to first order for smooth fitness func- 
tions when the alphabet size is increased beyond two letters to the 
generic case. As examples, we analyze the general analytic solution 
for sharp peak, linear, quadratic, and quartic fitness functions. 

1 Introduction 

There are two classical physical models of molecular evolution: the Eigen 
model [1-3] and the Parallel or Crow-Kimura model [4]. These models were 
originally formulated in the language of chemical kinetics [1], by a large 
system of differential equations representing the time evolution of the relative 
frequencies of each sequence type. Quasi-species models capture the basic 
microscopic processes of mutation and replication, for an infinite population 
of binary sequences. The most remarkable feature of these models is the 
existence of a phase transition, termed the "error threshold" [1,5], when 
the mutation rate is below a critical value, separating a disordered non- 
selective phase from an organized or "quasi-species" phase. The quasi-species 
is characterized by a population of closely related mutants, rather than by 
identical sequences [1-3], and its emergence is related to the auto-catalytic 
character of the replication process [1,5], which exponentially enriches the 
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proportion of the fittest mutants in tfie population. Experimental studies 
provide support for quasi-species theory in the evolution of RNA viruses [6,7]. 

The choice of a binary alphabet, which simplifies the mathematical and 
numerical analysis of the theory, represents a coarse graining of the four- 
letter alphabet of the nucleic acids DNA/RNA (A,C,G,T/U), by considering 
the two basic chemical structures of nitrogenated bases, purines (A,G) and 
pyrimidines (C,T/U). The choice of a four-letter alphabet represents nucleic 
acids. A 20-letter alphabet represents amino acids and protein structure and 
permits a close connection between sequence and fitness. 

Most numerical and analytical studies on quasi-species models consider 
the binary alphabet simplification [1-4]. In particular, the assumption of a 
binary alphabet allows for an exact mapping of the quasi-species models into 
a 2D Ising model [8,9], or into a quantum spin chain [10-14]. An exception 
is [15], where a four- letter alphabet was studied by a quantum spin chain 
representation of the parallel model. Other approaches to the nucleic acid 
evolution problem have been presented in [16, 17]. In all these studies it 
has been shown, through the application of different methodologies, that the 
steady state mean fitness of the population can be expressed in terms of a 
maximum principle, in the limit of infinite sequence length [N — > oo). The 
Probenius-Perrone theorem guarantees that there is a unique steady-state 
population distribution. It has been shown [18] that for a general family 
of linear (or effectively linear) models that evolve according to a matrix 
H = M + R, with M a Markov generator (typically representing mutations) 
and R a diagonal matrix (usually representing replication or degradation) 
of dimension N ^ oo, the Probenius-Perrone largest eigenvalue can be ex- 
pressed in terms of a Raleigh-Ritz variational problem. The high dimensional 
variational problem can be reduced to a low-dimensional maximum princi- 
ple with an error 0(1/A^), when basic symmetries can be assumed in the 
evolution matrix, such as permutation invariance of the replication rate, or 
symmetric mutation rates, which allows for a lumping [18] of the large se- 
quence space into sequence types or classes. This analysis was applied in [17] 
to obtain a variational expression for the mean fitness in the Kimura model 
with a four letters alphabet. We note that the Eigen model, where replica- 
tion and (multiple) mutations are correlated, possesses a different algebraic 
structure than the general family of models studied in [18]. In the Eigen 
model, the evolution matrix is of the form H — Q x R, with Q representing 
the mutation matrix, and R a diagonal replication matrix. 

In this article, we present exact analytical solutions of the /i-alphabet 
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Crow-Kimura and Eigen models by means of a quantum field theory. Our 
method generalizes the Schwinger spin coherent field theory for the binary 
alphabet in [19] to an alphabet of arbitrary size h. This method has also 
been recently applied in the solution of a model that includes transfer of 
genetic material between sequences in quasi-species theory [20,21], and two- 
parent recombination [21]. For the parallel model, we present exact analytical 
solutions of this field theory, in terms of a maximum principle, for the steady 
state mean fitness of the population and average composition, Eq. (jH]). We 
present as examples, results for the Kimura 3 ST mutation scheme [22], Eq. 
(54). We develop in detail the result for the symmetric mutation rate scheme, 
Eq. (55), for four different examples of microscopic fitness functions: sharp 
peak Eqs. (57) and (58), Fujiyama landscape Eqs. (60)-(63), a quadratic 
landscape Eqs. (65)-(69), and a quartic landscape Eqs. (71) and (72). In 
section 2.9, we apply our general formula to derive the mean fitness for the 
symmetric, general h case and discuss the /i = 20 amino acid alphabet, Eq. 

(74) . For the symmetric case, we present results for the sharp peak, Eqs. 

(75) -(77), and the quadratic case, Eq. (77). 

For the Eigen model, we present the exact expression for arbitrary al- 
phabet size h, Eq. (106). As an example, we apply the general expression 
to a mutation scheme analogous to the Kimura 3 ST [22], Eq. (112). We 
analyze in detail the solution for the symmetric mutations rate, Eq. (113), 
for four different examples of microscopic fitness functions: sharp peak Eqs. 
(114) and (115), Fujiyama landscape Eqs. (116-119), quadratic fitness land- 
scape Eqs. (121-123), and quartic fitness landscape Eqs. (125) and (126). In 
section 3.8, we apply our general solution to derive the mean fitness for the 
symmetric, general h case and discuss the /i = 20 amino acid alphabet, Eq. 

(127) . For the symmetric case, we present results for the sharp peak, Eqs. 

(128) -(129), and the quadratic case. 

These results bring quasi-species theory closer to the real microscopic evo- 
lutionary dynamics that occurs in the natural four-letter alphabet of nucleic 
acids or the 20-letter alphabet of amino acids. 
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2 The parallel model for an alphabet of size 
h 

The parallel model [4] describes the continuous time evolution of an infinite 
size population of viral genetic sequences. The evolutionary dynamics is 
driven by point mutations and selection, with mutations occurring in parallel 
and independently of viral replication. Each viral genome is represented as 
a sequence of N letters, from an alphabet of size h, and therefore the total 
number of different viral genomes in the population is h'^ . If we describe a 
viral genetic sequence in the alphabet of nucleic acids (DNA or RNA), the 
natural choice would he h = 4, and explicitly the alphabet corresponds to 
(A,C,G,T or U). It is common, to simplify the theoretical analysis, to choose 
instead a coarse grained alphabet of size /i = 2, by 'lumping' together purines 
(A, T or U) and pyrimidines (C,G). Alternatively, to describe evolution at 
the scale of protein sequences, the natural choice is to consider the h = 20 
amino acid alphabet. We here consider the case of general h. 

The probability pi for a virus to have a genetic sequence Si, 1 <i < , 
evolves according to the following system of nonlinear differential equations 

= Pi{ri - J2 "^iPi) + 5Z ^'^iPi 

Here rj is replication rate of sequence Si, and /ijj is the mutation rate from 
sequence Sj into sequence S^. The nonlinear term in Eq. ([T]) represents the 
average replication rate in the population, or mean fitness. This non-linear 
term enforces the conservation of probability, ^jPi = 1. This term can be 
removed through a simple exponential transformation, to obtain the linear 
system of differential equations 

= riqi + Y^lii3q3 (2) 

where Pi{t) = q,{t)/ qj{t). 
2.1 Fitness landscape 

We will assume that the replication rate (fitness) of an individual in the 
population depends on its relative composition with respect to a wild-type 
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sequence S*^. We define the relative composition variables, x", to be the 
number of letters of type a, divided by A^. The number of different letters is 
h, and the set of labels a refers to the set of chemical possibilities, such as 
{purine,pyrimidine}, {A,C,G,T}, or the 20 amino acids. For an alphabet of 
size h, at each site along the sequence, there are h — 1 independent composi- 
tions 0<x"<l, for !<«</;, — 1. Alternatively, these compositions may 
be interpreted as normalized Hamming distances from a reference wild-type 
sequence. Therefore, the replication rate for a sequence Si in the parallel 
model Eq. ([1]) is defined by the fitness function 

ri = Nf{x\x',...,x''-') (3) 

where the are defined within the simplex Y2a=\ ^ 1 • 



2.2 Schwinger spin coherent states representation of 
the parallel model 

We can express the parallel model in operator form, by generalizing the 
method presented in [19]. We define h kinds of creation and annihilation 
operators: dl^{j) , aa{j) , I < a < h and 1 < j < A^. These operators satisfy 
the commutation relations 







(4) 



These operators create/annihilate a sequence letter state 1 < a < /i, at 
position 1 < j < in the sequence. Since at each site there is a single 
letter, we enforce the constraint 



^al{j)aa{j) = 1 



(5) 



a=l 



for all 1 < j < A^. 

We define as the power on ajj(j) for the sequence state S",, 1 < i < 

, defined by the vectors 



N 



(6) 
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where |n*) = Y[a=i [^qO)]""*^''^ The constraint in Eq. ([5]) ensures that 
the condition X]a=i^a(i) — ^ ^^r all 

We introduce the unnormalized population state vector 



h 



N 



i=l 

which evolves in time according to the equation 

= -^1^) (8) 

Here, the Hamiltonian operator, to highest order in A^, is given by 

~ H = m + Nf{x\x'^,...,x^-^) (9) 

where m represents the mutation operator, and represents the composi- 
tions in operator form. 

Let us first discuss the mutation operator m. In the most general case, 
h{h — l) possible different substitutions can occur at each site in the sequence, 
i.e. /3 — i> a, with mutation rate that need not be symmetric. 

Each individual process can be written in the operator form 

at(j)r"/^a(j) = aiU)apU) + (j)a,(j) (10) 

which represents the creation of letter a by annihilation of letter j3. Here, 
the matrices t""^ are explicitly defined by 

[t-^]^ = 5,J,p + 5^{1-5,^) (11) 

After these definitions, the more general expression for the mutation operator 
is 

N h 

^ = 5^ 5Z /"a/? p(j>"^a(j) - at(j) . a(j)] (12) 

Let us now consider the Schwinger spin coherent state representation of 
the average base composition terms. 
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N N 

= iv E «a(j)«a(j) ^ ^ E i<a<h-i (13) 



where we defined the matrices 

We introduce the vector notation 

X — ^ X ^ • • • ^ X ^ 



(14) 



(15) 



= (e\ e^ . . . , e'^-i) (i6) 

Considering the previous expressions, the Hamiltonian operator becomes 



H = Nf 



N 



i=i 



N h 

E E /^"/^ pnj>"^s(j) - at(j) . a(j) 

(17) 



2.3 Functional integral representation of the parallel 
model 

We convert the operator representation of the parallel model into a functional 
integral form by introducing Schwinger spin coherent states [19]. We define 
a coherent state by 



e 2 



|o) 

E n ^"i— r \{'mi,m2,...,mh)j) 



mi,m2,---,mh- 



Coherent states satisfy the completeness relation 



(18) 



(19) 



The overlap between a pair of coherent states is given by 

{z'{j)\z{j)) = e-H^^'*0>[^^(j)-^^0-)]-[^^'*0')-^^(i)l-^^(i)} 
To enforce the constraint Eq. ([5]), we introduce the projector 



(20) 



N 



N 



n 



dX 



27T 



(21) 



At long times, due to the Perrone-Frobenius theorem, we find that the 
system evolution is dominated by the unique largest eigenvalue, fm, of —H 
and its corresponding eigenvector such that e~^*|{?2o}) ~ e-^™-* | ■?/'*)• 

To evaluate this eigenvalue, we perform a Trotter factorization, for e = 
t/M, with M — > oo, and introduce resolutions of the identity as defined by 
Eq. ( fT9l) at each time slice [19] 



e"-^* = lim 

M^oo 



■ M N 

nn 

.k=i j=i 



dzl{])dzk{3) 



TT" 



M 



\{zM})\{{{Su}\e~'''\{z,.,}){{z,}\ 



k=l 



[22) 



We define the partition function 



Tr e-^^P 

2^ r ^ 

n 



2n 



e"*^^ lim 

M^oo 



■ M AT 

nn 

k=l j=l 



dzl{j)dzk{j) 



71'' 



-S[z',z\ 



(23) 



Here, we defined 



e-'l^'^ = l[{{z,}\e-^^\{z,_^}) 



(24) 



k=l 



with the boundary condition io(j) = e^^^ zm^]) [19]. An explicit expression 
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for the matrix element in the coherent states representation is 



/ 1 ^ 



h 

eN f^c.f^ + ^Nf 

N h 



■ 1 ^ 

-5^zl(j)e4-i(j) 

1=1 



1\ h X 

j=l a^/3=l ^ 



Let us now introduce an (/;,-l)-component vector field Xk = (x^, ■ ■ ■ 
with 



1 ^ 



(26) 



We make this definition by introducing an integral representation of the 
corresponding delta function 



M 



1) 



1 ^ 



k=l 

-Q jj ieNdxldxl 

k=l a=l 



N 



27r 



-^^^Ef=i4-5?fc+eEfcLiEf=i^(i)4-e4-i{i) 



(27) 



Inserting this into the functional integral Eq. ( !23|) . we have [19] 

M^oo y 



X 



JV 



{zo}={e'^JZAf } 



(28) 



(25) 
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The matrix S{j) has the structure 



S{j) 



/I 

-A^ij) I 
-As{j) I 








V 







(29) 



/ 



where = I + £[Ylaft/3=i f^a/s^"'^ + ^k-&]- After performing the Gaussian 

integration over the coherent state fields, we obtain 



Z ^ lim / V\x\V\x]e'^^k=llf(Sk)-Sk■Sk-J:^^0=l^lc.0] 

M^OO J L J L J 

7=1 



(30) 



Here, 



det,5(j) = det 
= det 



M 



Trln 



= e 



I-e^^^l[A,{j) 

k=l 



(31) 



where the operator T indicates time ordering. Substituting this result in the 
partition function, we obtain 

M^oo J 

X / P[A] JJe-*^^e I J 

r ^ 

= lim / P[^P[f]e'^^^=i[-^(^'=)-^'=-^'=-^«^/3=iMa/3]lTg (32) 

.7 = 1 
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with 



M 



Q 



lim Tr T TT 

M^oo J- J- 



k=l 



lim Tr Te''^'==if^^7^'3=i'^"'^^"''"'"^'='®l 

M^oo 

Tr f e-^o rf<'[E!^^;3=i /^..^r-f +s{t')-0] 



After taking the hmit M —>■ oo, Eq. fl32l) becomes 

where the effective action is given by 

S[x,x\ = -N dt'[f{x{t')) - x{t') ■ x{t') - ^ Hap]-N\nQ 



Here, we have defined 

Q = Trre^o'^*'E*i'^"''""''+^'^i^"(*')®"l 



(33) 



(34) 



(35) 
(36) 



2.4 The large N limit of the parallel model is a saddle 
point 

Considering that the sequence length N is very large, — >■ oo, we can 
evaluate the functional integral Eq. for the partition function by looking 
for a saddle point. With the action defined in Eq. (!36|) . we have 



6S 



6S 



-N 



dm 



dx'^ 







6x'^ 



" Q5x'' 







(37) 



We denote the value of the action at the saddle point by Sc- We have therefore 
the system of equations 



df[x] 



(38) 
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(6"), l<a<h-l 



where we defined 



((•)) 



(39) 



(40) 



After tliis saddle-point analysis, we obtain a general expression for the 
mean fitness fm of the population, for an arbitrary microscopic fitness func- 
tion /(f). 



fm = lim — ^ = max 

N,t^OO Nt {^c,Sc} 

with Ajnax defined as 

Am 



(41) 



lim 



InQ 



t— >oo t 

and corresponding to the largest eigenvalue of the matrix 

h h-1 



(42) 



M(C{/i,^})= 5^ /x„/3r"^ + X;^c0" 



(43) 



As shown in detail in Appendix 1, the compositions a;" can be eliminated 
to reduce Eq. (HTj) to the final expression 



(h) 



max < f{x^, X 

-1 



1 ™2 /i 
c ' • • • ' "^c 



h-1 



X^Xc X^ 



h-1 



a=l 
h-1 



\ 



X 



h-1 
7=1 



/3=1 



\ 



h-1 \ h-1 

1 - 5^ xZ xf + 5^x^-1 

7=1 / 7=1 



(44) 



Here, the compositions < < 1 are defined within the simplex Yla=i — 
1. 
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2.5 The purine/pyrimidine alphabet h=2 

Most analytical and numerical studies of quasi-species models have been 
formulated in the past [12,14,19,20] by using a coarse-grained alphabet of 
nucleotides, where the nucleotide bases are lumped into purines and pyrim- 
idines, and hence h = 2. The maximum principle for this binary alphabet 
can be derived from the general expression (jS]), by assuming a symmetric 
mutation rate /ii2 = fi2i = Z^, and by noticing that a single composition 
xl = Xc (or normalized Hamming distance) is required in this case, 

/i^) = max {fix,) + /i[2v/x,(l-xj - 1]| (45) 

{0<a;c<l} J 

It is customary to use in this case a magnetization coordinate [12,14,19,20] 
defined as = 1 — Sxc, and hence Eq. fHSj) becomes 

= ^_J^f<i} {/(^^) + /^V^l^ - /^} (46) 

Eq. (H6!) is a well known result, which has been obtained in the past with 
different methods [12,14], including a version of the Schwinger boson method 
employed in the present work [19]. It is a special case of Eq. ( I44l) . 



2.6 The nucleic acid alphabet h = 4 



In the nucleic acids RNA or DNA, the alphabet is constituted by the monomers 
of these polymeric chains, which are h = 4 different nucleotides A, C, G, T/U. 
The general mutation scheme displayed in Fig. [1] is represented by setting 
h = 4 in our general solution Eq. (jS]), with 3 independent compositions (or 
normalized Hamming distances) xl 



1 s 



max 

{x^,x^,x'i} 




(47) 
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M34 

(2)C < » 

M24 



Figure 1: The generalized mutation scheme 

Here, the compositions are defined within the simplex + + < 1. 

An interesting reduction of this general model is provided by the Kimura 
3 ST mutation scheme [15-17,22], Fig.O The Kimura 3 ST mutation scheme 
considers mutations in three independent directions, with rates yUi,/i2,/^3 

^ f^i 

X 1^2 

Accordingly, three components of the Hamming distance between a pair 
of sequences Si and Sj are defined as follows 

di{Si, Sj) = ij^A^ciSii Sj) + i^G^xiSi, Sj) 

d2{Si, S,) = ifA^GiS^, S,) + ifcMS^, S,) (48) 

dsiSi, Sj) = i^A^riSi, Sj) + ^^c^ciSi, Sj) 

Here, ^Xi-*Y{Si, Sj) is the number of sites at which X and Y are exchanged be- 
tween sequences Si and Sj. The total Hamming distance between sequences 
Si and Sj is given by 

d{Si, Sj) = di{Si, Sj) + d2{Si, Sj) + ds^Si, Sj) (49) 

The mutation rate is therefore modeled by the function 

{/Iq,, da{Si, Sj) = d{Si, Sj) = 1 

-^ELi/^a, Si = Sj (50) 
0, diS,,Sj)>l 



M12 




14 



(DA 

(2)C 



M2 

< > 




Ms 



IJ2 



G(3) 
T(4) 



Figure 2: The Kimura 3 ST mutation scheme 



To be consistent with existing notation in the quasispecies hterature, we 
define 3 independent variables, which are simply transformations of the com- 
position variables, and which are called 'surplus' variables in the literature 



Ui{S,) = 1 - —[di{Si, Sy,) + d^i^Si, Sui)] 

U2iSi) = 1 - —[d2{Si, Sw) + ds^Si, S^n)] 
2 

UsiSi) = 1 - —[di{Si, Sy,) + d2{Si, Sui)] 



(51) 



Notice that according to this definition, the maximum value of m = {ui + 
U2 + ^^3)73 = 1 is reached when di = d2 = d^ = 0, that is the sequence Si 
being identical to the wild type S^,- The minimum value for the average base 
composition is obtained when one of the Hamming distance components, say 
di = N, while the others are null dj^ti = 0. Then, d = di = N and u = —1/3. 

The Kimura 3ST mutation scheme result is obtained from the general Eq. 
(1471) if the following symmetries are assumed for the mutation rates 



Atl2 


= 1^21 


— 1^34 


— /^43 


= /Ul 


/^13 


= Atsi 


= fJ'24: 


= /^42 


= /i2 


/^14 


= /i4i 


= fJ'23 


= fJ'32 


= fJ'3 



(52) 

We follow the quasispecies literature convention and define the 3 independent 
'ancestral distribution' coordinates C,^ (the subscript c denoting the saddle- 
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point limit), after Eq. (1521) 



l-2{xl+xl) 
l-2{xl+xl) 



(53) 



The ancestral distribution variable ^ is defined as the steady state analog of 
the 'surplus,' but for the time- reversed evolution process [23]. 
After some algebra, we obtain 



+ + ii + ii + - - ii + ii) 

+ e - e - e)(i - e + e - ef)] 
+ y[v/(i + e + 4^ + mi + e - - ii) 

+ V(i - e + eg - e)(i - - e +lf)] 
+y + e + + mi - e + - ec) 



From this general expression, the average composition 'surplus' u = {ui, U2, U3) 
is obtained by applying the self-consistent condition f{u) = fm- This result 
is equivalent to that derived by [17]. 

2.7 Analytic results for the symmetric mutational scheme 

For a symmetric mutational scheme, /ii = /i2 = yUs = we specialize the 
general Eq. fl5^ by setting ^1 = ^,1 = ^1 = C,c, and Ui = u, and thus obtaining 
an expression for the mean fitness 





(54) 



m 




(55) 
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This result is equivalent to that derived by [17]. We remark that Eq. fl55l) 
represents an exact analytical expression for the mean fitness of the pop- 
ulation, for any arbitrary microscopic fitness f{u), with the assumption of 
symmetric mutation rates /^i = = = From this exact expression, the 
average composition u is obtained by applying the self-consistency condition 
fm = f{u). In the following sections, we apply Eq. flS5]) to analyze in detail 
some examples of microscopic fitness functions: The sharp peak landscape, 
a Fujiyama landscape, a quadratic fitness landscape, and a quartic fitness 
landscape. We note that the h = 2 case contains a symmetry in the muta- 
tion terms about ^ = 0. In the general h > 2, this symmetry will be lost. As 
we will see, loss of this symmetry leads to a change in the order of the error 
catastrophe phase transition. 



2.7.1 The sharp peak landscape 

We shall first consider the sharp peak landscape, which is described by the 
function 

f{u) = ASu,i (56) 

That is, only sequences identical to the wild-type replicate with a rate A > 0. 
From Eq. fl55|l . we notice that this implies: = 1, if ^ > 3/i, or = 
otherwise. Therefore, we obtain for the mean replication rate 

_ J A- 3/i, A>3fi , . 

^'""I 0, A<3fi ^^'^ 

The fraction of the population at the wild-type pw is obtained from the self- 
consistent condition = p^A, 

There exists an error threshold in this case, which is given by the critical 
value Acrit = 3/i, as shown in Eqs. (1571) . ( |58ll and displayed in Fig. [3l The 
phase transition is first order as a function oi A/n. 

One may compare this result with the error threshold observed in the 
binary alphabet case, which is [19] Acru = Z^- This result is intuitive, because 
in the 4 letters alphabet, there exist 3 mutation channels to escape from 
the wild type instead of just one as in the binary alphabet, and therefore a 
stronger selection pressure is required to retain the wild-type features. 
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2 4 6 8 10 

A/|x 



Figure 3: Average composition u, magnetization C,c and fraction of the popu- 
lation at the wild-type sequence p^, as a function of the parameter A/ fi, for 
the sharp peak fitness. 

2.7.2 The Fujiyama fitness landscape 

The Fujiyama landscape is obtained as a linear function of the composition 



We will present analytical results for the symmetric case ttj = a, Hi = /x. 
Thus, C,l = C,c = C,c = C,c- Substituting in Eq. fl55l) . we have 

fm= max ha^, - ^^(1 + Q + ^fiy^il - Q{1 + 3^ A (60) 

We look for a maximum 



f[u\ = aiMi + 02^2 + as^a 



(59) 



m 




3 3 2 - 6^c 







(61) 



c 



From this equation, we obtain 




(62) 
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To obtain the average base composition u, we apply the self-consistent con- 
dition fm = f{u) = Saw, to obtain 

u = -\ l-2^ + -Ja^ -fia + fiA (63) 
3 \ a a J 

Clearly, no phase transition is observed in this fitness landscape, as < m < 1 
for < a < oo. This result is in agreement with the analysis presented in [15], 
were a quantum spin chain formulation was employed. 



2.7.3 Quadratic fitness landscape 

The quadratic fitness landscape is given by the general quadratic form 

3 



i=l 



We will present the analytical solution for the symmetric case ai = a, 
j3i = f3, with the symmetric mutation scheme /ij = fi. Under these conditions, 
we have = ^c — = ^c, and from Eq. ( l55l) we have for the mean fitness 



fm= max {^pec + 3<c - ^/^(l + Q + ^/iV(l - ail + 3q] 



(65) 



The maximum is obtained from the equation 



^ = 3/3^0 + 3a - -/i + -/i , ^ = = (66) 

d^c 2^ 2'^2v/l + 2^,-3e ^ ' 

From Eq. (1661) . we obtain 

2 2 ^1 + - 3e ^ ^ 

As shown in Appendix 2, this equation can be cast in the form of a quartic 
equation, whose roots are the values of ^c- The average composition u is 
finally obtained through the self-consistency equation 

U = f\n) = ^(3u^ + 3au (68) 
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We find that the error threshold transition towards a selective phase for 
a = is defined by > 0, m > 0, at /9 > 3/2/i. The value of u is continuous 
at the transition, as it is straightforward to check from Eq. f l66|) fm{C,c = 
0) = fmi^c = 2/3,/? = 3/i/2) = 0, which implies after Eq. ([68]) (for a = 0) 
that u = when approaching the critical point (3 = 3fi/2 from both sides. 
However, jumps from [for p (3/^/2)-] to 2/3 [for /3 (3/i/2)+] 
(Appendix 2). To analyze the order of the transition, we expand Eq. f l65l) as 
a quadratic polynomial in C,c in a neighborhood of the critical point f3 ^ 3/i/2, 

r |(/?-3/i/2)e2 ,p<3fi/2 

\ |;3_^+(2/3-3/i)(ee-i) + (i/3-3/i)(^,-|)2 ,/3>3/x/2 

(69) 

we find that the first derivative dfm/d(5[l3 (3/i/2)~] = 0, while dfm/dP[P —>■ 
(3/i/2)+] = 2/3, and thus it has a discontinuous jump from to 2/3. There- 
fore, the phase transition is first order as a function of /3//i. We notice that 
the order of the phase transition, for a similar quadratic fitness landscape, is 
found to be of second order for a binary alphabet [19]. 

When < a//3 < I ^y^ — ij, as shown in Appendix 2, we find a 

finite jump in the magnetization from C,c,+ to C,c,-, with ^c,± = 1/3(1 ± 
V^l - 18a//?-27(a//?)2). This resuh is in agreement with [15], where an 
alternative method of quantum spin chains was applied for the derivation. A 
complete analysis of the different possible cases other than this, is presented 
in Appendix 2. 



2.8 Quartic fitness landscape 

As a final example, we consider a quartic fitness landscape 

f{u) = i2j^t (70) 
1=1 

As in the previous cases, we consider the symmetric mutation rates /ij = 
/i. Pi = p, and hence ^c,i = (,c- Considering this fitness function in the 
general equation (155|1 . we have that the mean fitness is given by the analytical 
expression 

U= max \jP^t - 1/^(1 + ^c) + |/iV(l - ec)(l + a^c)) (71) 
{-3<Cc<i} [4 2 2 J 
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Figure 4: Average composition u and magnetization as a function of the 
parameter P/fi, for the quadratic fitness when a = 0. 

The average composition u is obtained by applying the self-consistent condi- 
tion 

f{u) = ^f3u' = fm (72) 

In Fig. [5l we present the values of u and ^c, as obtained from Eqs. (17T|) . 
(172|) . as a function of the parameter A discontinuous jump in the bulk 
magnetization from = to = 0.971618 is observed at j3/fi = 3.67653. 
By expanding Eq. (1711) near the critical point, after similar procedure as in 
the quadratic fitness case, we find a discontinuous jump in the derivative 
dfm/d(3, from to 0.66841101. Therefore, the phase transition is first order 
in The average composition u, however, experiences a fast but smooth 
transition. This behavior is much alike the one observed in the sharp peak 
fitness landscape, Eq. fl571) and Fig. ([3]), except for the fact that the average 
composition u is continuous at the transition. Indeed, from a purely math- 
ematical perspective, a fitness function following a power law = ku"', 
for < M < 1, will satisfy the limit 

lim /„(m) = kSu,i (73) 

n— >oo 

which is precisely the sharp peak landscape, Eq. fl5B]l . 
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Figure 5: Average composition u and magnetization as a function of the 
parameter for the quartic fitness landscape. 

2.9 Symmetric case, general h, with application to the 
amino acid alphabet /i = 20 

An alternative language to describe molecular evolution is in terms of mu- 
tation and selection acting over the translated protein sequence, which is 
drawn from an /i = 20 amino acid alphabet. For the parallel model, the 
time evolution of an infinite population of protein sequences is described by 
the system of differential equations ([2]), with h = 20. Thus, we are lead to 
consider the h = 20 case. We first consider the symmetric case with general 
h. 

For an alphabet of arbitrary size h, a symmetrical mutation scheme /Iq,/? = 
/i, and a symmetrical fitness function that leads to = Xc, we define a 
magnetization coordinate = 1 — hxc, and Eq. fHlj) reduces to 

fm = max (/(^c) + ih- V(l - m + ih- m 

+ (74) 

As an example of application of Eq. ( l74l) . we consider the sharp peak 
fitness landscape f{C,c) = ^^^c,i- Then, from Eq. ( 1741) we obtain the mean 
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fitness 



_ j A-{h-l)fi, A>{h-l)i^ , . 

0, A<{h-l)iJ. ^'^> 

We obtain the fraction of sequences in tiie wild type, p^, by applying the 
self-consistency condition = Apyj, which yields 

p =1 1-^' (76) 
I 0, A<{h-l)ii ^ ^ 

This result is intuitive, since there exists h — 1 independent mutation chan- 
nels for the sequence to escape from the wild type. Moreover, for a general 
alphabet of size /i, a first order phase transition occurs at the critical point 

A' it = {h- 

As a second example, we consider the quadratic fitness landscape for an 
alphabet of size h, f{^c) — {h — l)/3^^/2. For the quadratic fitness function 
we can work out the order of the phase transition for general h. We consider 
/(O — l)/5'C^/2- There is a phase transition at j3crit = 2fi{h— l)/h. The 
magnetization jumps from = at /3 — > to = {h — 2)/{h — 1) at 
13 — > The first derivative at the critical point is: 

dfm _{ 0, p< 2/i(/l - l)/h 



d/3 



r 0, (3<2i^{h-l)/h 

\ {h-2)y[2{h-l)], P>2i,{h-l)/h 



Thus, the jump in the first derivative is {h — 2)^/[2(/i — 1)]. Thus, the 
transition is second order for h — 2 and first order for /i > 2. 



3 The /i-states Eigen model 

The Eigen model conceptually differs from the parallel or Kimura model 
because it is assumed that mutations arise as a consequence of errors in the 
replication process. For an alphabet of size h, the system of equations which 
describes the time evolution of the probabilities Pi, with 1 < i < h^, is 



dpi 
dt 



N 



Y^[Bijrj - 5ijDj]pj -Pi 



N 



- Dj)pj 



(78) 
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Here, is the replication rate of sequence Si, and the components of the 
matrix 



Bi 



N-di. 



h-i 



(79) 



represent the transition rates from sequence Sj into Si, where 1 — g is the 
probabihty to copy a nucleotide without error, and q is the probability per 
site for a base substitution during the replication process. 

We consider a generalized version of this, by considering that the base 
substitution probabilities are not necessarily identical nor symmetric. That 
is, for a base substitution /3 — a, we consider a probability qaj3 7^ qj3a- 
Correspondingly, we define h — 1 independent base compositions < < 1 , 
which can also be interpreted as normalized Hamming distances with respect 
to the h-ih reference species, = d'^/N. For this generalized mutation 
scheme, the transition rate matrix components are defined by 



n 

a^f3=l 



(80) 



Here, q = qap, and the are as in Sec. 2.1. 



3.1 The /i-states Eigen model in operator form 

By similar arguments as in the parallel model, we formulate a Hamiltonian 
operator for the Eigen model 



-H 



N 

n 



h 



xNf 



1 ^ 1 Tl ^ 

-^at(/)0a(O -Nd -^at(/)ea(0 
1=1 J L 1=1 



(81) 



Here, the matrices t"'^ are defined as in the parallel model by Eq. (ITTil and 
in the matrix array = (6^, B^, . . . , Q^~^), the matrices 0" are defined as 
in Eq. f|T^ . Let us define the coefficients fiap = Nqap. The degradation 
function is given by Di = Nd{x^,x^, 



.x" Then, we have for g <tC 1 



h 



iVln(l -q) Nq 



N J2 

a^f3=l 



(82) 
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The Hamiltonian operator Eq. (15T]) is expressed, to 0{\/N), by 



1 ^ 1 [l ^ 

-Y^a\l)Ba{l) ~Nd -Y^a\l)Ba{l) 

1=1 A L 1=1 



(83) 



To study the equihbrium properties of the system, as in the case of the 
parallel model, we calculate the partition function by performing a Trotter 
factorization 



Tr e-^^P 



2n 



-j-j- dAj 



271 



e lim 

M^oo 



.k=i j=i 



-S[z*,z\ 



Here, 



M 



(84) 



(85) 



k=l 



where the matrix elements in the coherent states basis are given by the 
expression 

({4}|e-^^|{4-i}) 



e-^Ef=i{^l(i)-4(i)-25^{i)-4-i(i)+5^_i(i)-4-i{i)) 



1 ^ 1 r 1 ^ 1 \ 

-5^i:(j)ezVi(j) -eNd -5^i:(j)04-i(j) 
i=i J L i=i J ^ 



Let us introduce the (/;,— l)-component vector field Xk = (x^, x^, . . . , ^), 
and an integral representation of the corresponding delta function 



(86) 



1 



M 



-1) 



k=l 

nn 

k=l a=l 



1 ^ 

-^z^(j)e4_i(j) 



27r 



(87) 
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Similarly, let us introduce a second set of fields 77"^, 



M 



k=l 



1 ^ 



■ M 

n 

.fc=i 



xe 



27r 



(88) 



Inserting both constraints Eq. flHTl) and Eq. flHHl) in the expression for the 
trace Eq. fl84|) . we obtain 



Z = lim / TT P[r7"^]P[r/"^] 



AT 



j=l {zo}={e"^5'M} 

After performing the Gaussian integral over the fields z* we obtain 



(89) 



Z = lim / TT V[ri'"^]V[f]' 



N 

xV[X]l[e-''^[detS{j)]-' (90) 



The matrix S'(j) has the structure 



( 1 00 

-A^ij) I 
-A3(j) / 








V 







(91) 
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where Ak{j) = I + e[Y.l^p=i f^apVk'^ + Xfe ■ 0]. We obtain 



det S{j) = det 



M 



k=l 



det 

Trln 



(92) 



Substituting this last expression into the functional integral Eq. fl90l) . and 
then performing the integrals over the A fields, we obtain 



/h 
V[x]V[x] TT I?[r/"^]r'[r/"^] 



AT 



no 



(93) 



Here, 



M h 

Q = lirn^ Tr f J] [/ + e( ^ r/f r"^ + Xk ■ &)] 

k=l a^P=l 

= lim Trfe^^^=iE'^^=i^fc"''""^+^''-®l 
= Tr f e^o ««*'E^^;3=i ^?"''(t')r"''+5;(i')-e] 



After taking the limit M —>■ oo, we obtain 



/3l^-5[f,i',{f)°/5},{??-/'}] 



(94) 



(95) 



Here, 



j-t h 

-N / rft'{-x(t') ■ x{t') - r/"^(t')r?"^(t') 

(96) 
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where 



With this last simphfication, the effective action becomes 



(97) 



S[x,x,{r''},{v"''}] 



j-t h 

-N / dt\-x{t') ■ x{t') - J2 



^gE^^^=i/^./3(r;"^-i)j[^(i')] _ d\x{t')]) -NXiiQ 

(98) 

3.2 The large N limit of the /i-state Eigen model is a 
saddle point 

By assuming that the sequence length N is very large, — > oo, we can eval- 
uate the functional integral Eq. fl95l) by a saddle point method. Considering 
the action defined in Eq. fl98l) . we have 



5S 



5S 



6S 



Sx'^ 
6S 



Sc,S,,{fj?''},{v?''} 



i)dm 



dx"" 



+ 



dd[xc 



1 5Q 



= 



iV hr" - 



N i 7]f - 



QSx^ 
1 5Q 







QSf] 



a/3 



We have therefore the system of equations 



dx'^ 







dd[xc 



dx'^ 







(99) 



(100) 



(101) 



x: = (0") 



(102) 
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rif = (r°^) (103) 

where we defined 



((•)) 



(104) 



After tlie saddle-point analysis, we obtain an exact analytical expression 
for the mean fitness fm of the population, in the limit of very large sequences 
^ oo, for an arbitrary microscopic fitness function f{x) and degradation 
rate d{x) 

/„= lim ^ = max [e^'^^^-^^-^^'^^'-'^ f{x,) - d{x,) 

h 

-S,-X,- ^f^f + Amax] 

(105) 

As shown in Appendix 3, the r/"^ can be eliminated in terms of the com- 
positions, to obtain the final expression 

/™ = ^ max^ |/(xj,a;^,...,x^^)e^'^^=i^"^t^^-"^l 

xgE^=iMh/3[y(T^E^iRK+^7=;^?-i] _ d{x\, xi,..., x;?-^) 

(106) 

Here, the compositions < < 1 are defined within the simplex Yla=\ — 
1. We note that the mutation terms in Eq. (11061) are the exponential of the 
mutation terms in Eq. (jBl), which is a result of the mutation terms in the 
Eigen Hamiltonian, Eq. fl83|) . being the exponential of those in the parallel 
Hamiltonian, Eq. ([Tj 



3.3 The purine/pyrimidine alphabet h=2 

As an application of our general solution Eq. ( ]106p . we first consider the 
purine/pyrimidine alphabet with h = 2. The maximum principle for this 
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binary alphabet can be derived from Eq. fll06p by assuming a symmetric 
mutation rate /ii2 = /i2i = /U, and by noticing that a single composition (or 
normalized Hamming distance) xl = Xc is required in this case, 

7^2)= max /eMV-={i-^)-il -d(x,)) (107) 

0<Xc<l I J 

It is customary to use in this case a magnetization coordinate, defined as 
= 1 — 2xc, and hence Eq. (11071) becomes 

/i^) = max {e^Iv^-^l/(ec) - di^] (108) 

Eq. (11081) is a well known result [14, 19], and a special case of our general 
result Eq. (|T06ll . 

3.4 The nucleic acid alphabet h = 4 

For a general, non-symmetric mutation scheme as in Fig. 1, by considering 
h = 4 in our general result Eq. (I106P we obtain 

U = max |/(x^,x^,a;3)e^'^/'-'^"''[v^-?]eS'^i/^"^[V-?(i-SU-Z)-<l 
^^j:U^^^Wi^-^U^'^y^+^U^'^-n _ c^(x^,a;,^a;^)l (109) 



Here, the compositions are defined within the simplex xl + x'^ + x^ < 1. 

An interesting reduction of this general model is provided by the Kimura 
3ST mutation scheme, introduced in section 2.6, and represented in Fig. 2. 
We obtain the solution for the Kimura 3ST mutation scheme by assuming 
the following symmetries in the mutation coefficients 



fJ'12 


= At21 


= /^34 


= A^43 


= AH 




= f^3l 


= fJ'24: 


= /i42 




/il4 


= /X41 


= /U23 


= /i32 


= /U3 



(110) 

along with the magnetization coordinates defined in agreement with Eq. 

dnoD, 

= i-2{xi+xi) 

C = l-2{xl + xl) 

= l-2{x\ + xl) (111) 
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With these assumptions, after some algebra, Eq. (llU6p reduces to the expres- 
sion for the Kimura 3ST scheme. 



fm = max < e^^ 

xe'^2(iV(i+«c+«c+«?)(i+«^'-€c-«?))+W(i-«c+«?-«?)(i-€c-?c+«?)-i) 

x/(C\C',a-^(C\ea} (112) 

3.5 Analytical results for the symmetric mutation scheme 

If the mutation rates are identical /ii = /i2 = A^s = f^, then we have the 
symmetric case = C,c = = ^c, and after Eq. (11121) we obtain 

U= max |ei'^[-i-«^+V(W)(i-?e)]j[^j _ ^j^jj (^S) 



3.5.1 The sharp peak fitness landscape 

Let us first consider the sharp peak landscape f{u) = {A — Ao)5u,i + ^o, 
with A > Aq. That is, the replication rate is f{u = 1) = A for sequences 
identical to the wild type, and f{u ^ 1) = Aq, for all other sequences. With 
zero degradation rate, d = 0, we notice that this result implies: = 1 if 
A > Aqc^^, or = otherwise. Therefore, we obtain the mean replication 
rate 



/"^=r. ' .:?3. (114) 



e'^^'A, A > Aoe^^" 
Ao, A < Aoe^/^ 

The system experiences a phase transition which is first order in A. The 
steady-state probability for the wild-type is obtained from the self-consistent 
condition: = Ap^ + Ao{l - p^), 

^"'"1 0, A<Aoe''^ ^^^^^ 

Notice that the error threshold is reached at the critical value Acrit = Aqc^'^, 
as follows from Eqs. (11141) . (IllSp and as displayed in Fig. El We notice that 
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Figure 6: The average composition u and magnetization C,c are represented 
as a function of the parameter A/Aq, for the sharp peak landscape. The 
mutation rate was set to = 1.0. Also shown (inset) is the fraction of the 
population located at the peak, p^. 

this result differs from the analytical value obtained for the binary alphabet 
[19], A^^J-^ = Aqc^. The additional factor of three which is explicit in the 
exponent is clearly a consequence of the existence of three mutation channels 
into which evolving sequences can escape from the wild-type. This effect, 
which is purely entropic- and not fitness-like, is an explicit consequence of 
the larger alphabet size. 

3.5.2 The Fujiyama fitness landscape 

We will consider the Fujiyama fitness landscape, which is a linear function 
of the composition 

3 

f{u) = ^(aiUi) + ao (116) 
1=1 

For the symmetric case, = a, /Xj = /i. Therefore, we have = = = 
^c- The mean fitness, in the absence of degradation, from Eq. 01131) becomes 

fm= max \{3a^, + ao)ei^[-i-«^+V(W)(i-Cc)]| ^^^^^ 

-I<5c<l I- J 
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By maximizing with respect to ^c, ^of- = 0, we obtain the nonhnear equation 



/i 3 /i (3a^c + «o)(3^c - 1) 
a an uatc = - 



;il8) 



No error threshold is observed for this fitness landscape, except for the trivial 
limit a — > 0, > 0. The average surplus u is obtained by the self-consistent 
equation 



fm = 3aM + ao 



;ii9) 



3.6 The quadratic fitness landscape 

Next we consider the quadratic fitness landscape 



N 



fin) 



f + oiiUi 1 + 1 



i=l 



(120) 



For the symmetric case, = /5, a, = a, fii = fi, we have ^* = and 
= ^c- Thus, the mean fitness, for a null degradation rate, after Eq. (11131) 

is 



1211 



We maximize with respect to C,c, = 0, to obtain 



1 



1 



122) 



v/(l + 3ec)(l-ec) 

The average base composition m is obtained from the self-consistent condition 

3 



fm = f{u) = -I3u + 3au + 1 



:i23) 



The selected phase, > 0, m > 0, occurs for /3 > 1.8096yU when a = 0. 
The value of u is continuous at the transition, as it can checked from Eq. (11211) 
that fm{ic = 0) = /„(^c = 0.2289, /3 = 1.8096, = 1) = 1, which implies 
after Eq. ( 1123^ (for a = 0) that m = when approaching the critical point 
P = 1.8066/i from both sides. However, jumps from (for P — ^ 1.8066/i~) 



33 



JJLP 



0.6 
0.5 
0.4 
0.3 
0.2 
0.1 









10 

p/^l 



15 



20 



Figure 7: The average composition u and magnetization are represented 
as a function of the parameter for the quadratic fitness, when a; = 

to 0.2289 (for (3 1.8066/i+). By expanding Eq. ffmi) near the critical 
point, after a similar procedure as in Eq. (1691) for the parallel model, we find 
a discontinuous jump in dfm/dp from to 0.06883. Therefore, the phase 
transition is of first order in f3. A graphical representation is displayed in 
Fig.E 

3.7 The quartic fitness landscape 

As a final example, we consider the quartic fitness landscape, 

m = f2j^t + ^ (124) 

i=l 

We further consider the symmetric case /ij = fi, j3i = P, and hence ^* = ^c- 
From the general expression Eq. flll3p . we obtain an analytical expression 
for the mean fitness 

U= max { ( -PC + l] ei'^[-i-^^+v/(i+35c)(i-«c)] 1 (i25) 
{-i<«c<i} [\A J J 
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Figure 8: The average composition u and magnetization C,c are represented 
as a function of the parameter j3/n for the quartic fitness landscape 

The average composition of the population u is obtained from the self- 
consistent condition 

f(u) = ^Pu^ + l = U (126) 

In Fig. [HI we present the values of u and ^c, as obtained from Eqs. fll25p . 
(11261) . as a function of the parameter (3/fi. We notice that a discontinuous 
jump in the bulk magnetization from = to = 0.779856 is observed at 
13/ n = 10.776165. By expanding Eq. (11251) near the critical point, we find 
a discontinuous jump in dfm/dp, from to 0.066213. Therefore, the phase 
transition is of first order in j3. The average composition shows a fast but 
continuous transition. This behavior is much like the one observed in the 
sharp peak fitness landscape, Eq. (11141) . and in the corresponding example 
for the parallel model. 

3.8 Symmetric case, general h, with application to the 
amino acid alphabet h = 20 

We consider the case of the amino acid alphabet, which is derived from our 
general solution Eq. (I106p by setting h = 20. In particular, when a symmetric 
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mutation scheme is assumed = /i, for all and = x^- We first 
consider the symmetric case for general h. 

For an alphabet of size we define a magnetization coordinate = 
1 — hxc, and obtain that Eq. fll06p reduces to 

= max I /(ec)e("-^)'^[^V(i-5c){i+('^-i)Cc)+V(i-?^)-i] - 

^ {-i/(/^-i)<ec<i}\^^^'^ 

(127) 

As an example of application of Eq. fll27p . we consider the sharp peak 
fitness landscape /(^c) = ~ ^o)'^5c,i + ^O; and zero degradation function 
(i(^c) = 0. Then, from Eq. (11271) we obtain the mean fitness 



^"^"^ An, ' A<Ane('^-i)'^ ^^^^^ 



The system experiences a first order phase transition at Acrit = Aoe^'^~^^^. 
The steady-state probability for the wild-type is obtained from the self- 
consistency condition: = Ap^ + Aq{1 — 



-"w I 1 




P--^ n° A < Aoe('^-')>^ ^ ^ 

The factor (/i — l)/i in the exponential is intuitive, since there exists h — 1 
independent mutation channels into which evolving sequences can escape 
from the wild-type. 

As a second example, we consider the quadratic fitness landscape /(^c) = 
{h — l)/5^^/2 + 1 for an alphabet of size h. We consider the case of the 
amino acid alphabet, with h = 20. We set = 1. By a similar analysis as 
in the parallel model case, we find a phase transition at the critical point 
(3 = 7.483/i. The magnetization parameter C,c has a finite jump from = 
(for /3 7.483^-) to = 0.0827 (for /3 7.483/i+). The mean fitness is con- 
tinuous at the transition, since fm{P — > 7.483/1"*") = fm{P 7.483/i~) = 1, 
which implies that the observable m = at the critical point. We ob- 
serve that the first derivative has a finite jump at the critical point, from 
dfm/d(3{f5 7.483/i-) = 0, to dfm/d(3{(3 7.483/i+) = 0.0437, and there- 
fore the phase transition is of first order. A similar analysis shows that the 
transition is first order for h = 3 a.s well. As with the parallel model, we 
find that the transition for the quadratic fitness function is second order for 
h = 2 and first order for h > 2. 
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4 Conclusion 



Using the quantum spin chain approach, the 2-state, purine/pyrimidine as- 
sumption for quasi-species theory has been hfted to arbitrary alphabet sizes 
h. We have here expressed the general result for the fitness of the evolved 
population as a maximization principle. We have derived the solution for a 
general fitness function using the Schwinger spin coherent states approach. 
We have presented analytic results for the sharp peak, as well as linear, 
quadratic, and quartic fitness functions. For the Kimura 3 ST mutation 
scheme, we have presented an explicit solution for a general fitness function, 
expressed ClS db maximization principle. 

We have also derived the general solution to the Eigen model of mutation 
and selection for arbitrary alphabet size and for a general mutation scheme. 
We have presented analytic results for the sharp peak, linear, quadratic, and 
quartic fitness functions. 

These results bring quasi-species theory closer to the evolutionary dy- 
namics that occurs at the genetic level. 
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Appendix 1 

In what follows, we will need to apply Euler's theorem and three properties 
of the maximum eigenvalue Amax of the matrix M defined by Eq. fj43|) . 

Euler's theorem for homogeneous functions 

Definition 1: Homogeneous function of degree k 

A function /(xi, . . . , Xn) of n variables is homogeneous of degree k if, for all 
a > 0, 

f{axi, ax2, . . . , axn) = X2, ■ . ■ , x„) (130) 
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Euler's theorem 

A different iable function f{xi, . . . , Xn) of n variables is homogeneous of degree 
k if and only if 



d 



^ ^ -/(Xi, . . . ,Xn) — kf(xi, . . . ,Xn) 



1=1 



(131) 



The theorem and its proof is presented in most textbooks of mathematical 
analysis [24]. 



Property I 

The maximum eigenvalue A^ax of the matrix M defined by Eq. (H3|) is a 
homogeneous function of degree /c = 1 in the vector (xj, xl, . . . , x'^~^, {f^a/s})- 
The proof of this proposition follows directly from Definition 1. Notice 
that after Eq. (H3l). 

h h-1 



M{{x:}, M) = J2 /^"z^^"'' + E 



a=l 



:i32) 



Since M is a linear function of the vector (xl, . . . 



, . . . , X^ , ^al3 



M{axl, axl, . . . , ax'^ \ a/ii2, a/i2i, • • •) = aM(xJ, x^, . . . , \ jj^u, /i2i, • • • 



(133) 



Therefore, the maximum eigenvalue, as obtained from the long-time limit 
of the trace, 

Amax(aSc' "^c' • • • ' a^c~\ «Atl2, a/i21, • • •) 
rjij, gtM(ax;l,ox^,...,asJ?~^,a/ii2,a/i2iv) 

rj^ ^taM{xl,x^,...,Xc~^,IJ.l2,l^2i,---) 
= ttAniax(a;c; 2;^, . . . , X^ ""^j /il25 /^21, • • •) 

is also a homogeneous function of degree k = 1, after Definition 1. 



lim - In 

t^oo t 



lim - In 

t^oo t 



(134) 
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Property II 

Amax satisfies the identity 



h-1 h 



a=l 



(135) 



The proof follows directly by application of Euler's theorem, for a homo- 
geneous function of degree k = 1. 

Notice that, after the saddle-point Eqs. (!37|) - fj40|) . we have the following 
identities 



Tr 









lini 

i^oo Tr e*^ 



d 



In 



lim „ 



Tr e 



tM 



d 



d 



- Ar 



a/3 



(136) 
(137) 



Substituting Eqs. fll36p - fll37l) into Eq. (11351) . we obtain the identity 

A^..-> ■a;^x°= > ■ Ur.Rir'"') (138) 



h-l h 



a=l 



After Eq. flM . Eq. ^ becomes 



/,„ = max 



/({<}) - E ^-/^(i - (^"')) 



(139) 



Property III 

The 'average' (A) of an arbitrary h x h matrix A, satisfies the identity 



(A) = hm 



Tr 









fi;^ Tr e*^ 
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2/max^^max 



(140) 



with ^max the eigenvector corresponding to the maximum eigenvalue A^ax of 
the matrix M in Eq. 



The proof follows by considering the unitary h^h matrix P = [yi, ^2, Vs, ■ ■ ■ 
= P"^ whose columns are formed by the h orthogonal eigenvectors jja of 
M which satisfy 



XaVa I < a < h 



6. 



af3 
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From elementary linear algebra, the matrix P induces a similarity transfor- 
mation which diagonalizes M, that is PMP^^ = diag(Ai, A2, . . . , Xh) = D. 
Hence, it also diagonalizes the exponential of M, 



p tMp-i ^ P(I + tM + -MM + ...)P"^ 

V 2! ^ 

= I + tPMP^ + -PMP^PMP^ + ... 



2! 



l + tB + -D^ + . . . 



(142) 



The 'average' of an arbitrary h x h matrix A, defined by Eq. fll40p . is 
calculated as 



(A) = hm 



Tr 


Ae*M 







t^oo Tr e*'^ 



lim 

t^oo 



Tr 


PAP-iPe*^p-^ 


Tr 


PgtMp-l 





Tr 



lim 
lim 

t—>co 



PAP^e 



En 



CAy^,. (143) 



which proves the Property III. 

We can express the eigenvector y^ax = {y^,y'^, ■ ■ ■ ,y^)'^ in terms of the 
fields x^, by combining the result in Property III, with the saddle-point 
equations Eq. (11361) .( 11371) . as follows 
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(y 



a\2 



These equations are inverted to obtain 



l<a<h-l 
a = h 



(144) 



(145) 



Equipped with this resuh, we can now calculate the 'averages' (r"'^) in 
Eq. dnHD, 



i/max ' i/max 



(146) 



Substituting Eq. (11451) into Eq. (11461) . we obtain the result 



(147) 



Substituting Eq. (I147P into Eq. (I139p . we obtain the final solution for the 
mean fitness of the parallel model in an alphabet of size h, with an arbitrary 
mutation scheme, 



(h) 

m 



,x 



h-1 
a^f3=l 



h-1 



a=l 

h-1 



\ 



h-1 



1-E 



xi ] -x:. 



7=1 



+ ^f^hl3 

/3=1 



\ 



h-1 



h-1 



7=1 



7=1 



(148) 



41 



Appendix 2 



By performing elementary algebraic manipulations Eq. (j6 

3/?ec + 3« - -/^ + -^1=EI^= = 

2^ 2 2^1 + 2^.-3^ 
can be cast into the standard form of a quartic equation 

AC + + + D^, + E = (149) 

where, by defining jl = ^/(3 and a = a/ [3, the coefficients correspond to 

A = 3 

5 = 6a-3/i-2 (150) 
C = Sfx^ — 3ajl + 3(5^ + 2/i — 4a — 1 

= -2(5 - 2a^ + /i + 2a/i - 2/i^ 
E = -a^ + ajx (151) 

We remark that this quartic equation introduces additional, unphysical solu- 
tions to the original Eq. fl67j) . However, discarding these unphysical solutions 
whenever appropriate, the quartic Eq. ( 1149^ allows us to obtain explicit an- 
alytical expressions for in the entire region of parameters. Following Fer- 
rari's method [25], we define the parameters 

3^2 C 

1 a /i ajl 5/i^ 



^ ^ ^ (152) 
2 3 2 6 2 8 ^ ' 



B^ BC D 



02 



8^3 2^2 A 

4 4a 2/i /i^ 3a/i2 2>fi^ . , 

~27 ~ T ^ T ~ T ^^~8~ ^ ^ 

35^ E 

~ ~ 256^4 ^ 16^3 ~ 442 ^ 1 

5 7a 5a2 oi^ -j^- ^j,- ~2 



432 108 72 12 16 216 72 8 8 288 

H ^ H ^- ^- — H ^ (154) 

16 32 96 32 256 ^ ^ 
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and solve the depressed quartic equation in the auxihary variable = + 
B/AA, 

+ a\Z^ + + as = (155) 

We analyze the different cases in the parameter space that defines the 
possible solutions of this equation. 

Case 1: a2 = 0. This situation arises at the critical value 

/i(i) = ^ + 25 (156) 

We obtain four possible roots, according to the general formula 



^ ^ _B_ ^ ^ /-ai ± ^af -4a3 



^ (^2 ± V2v^l - 9a(2 + 3a) ± |1 - 9a(2 + 3a)|) (157) 



6 

Depending on the sign of the term in the square root, we have the following 
solutions 

i) If 1 — 18(5 — 27(5^ > 0. This situation occurs when —\<ol< 
I iy\J\ — 1 j ) and the solution is 



1 



ec,± = -(l±Vl- 18a -2752), ^^=1 (158) 
ii) If 1 — 18(5 — 27(5^ < 0. This situation occurs when o. > ^ (\/^ ~ ^ ' 

ic = \ (159) 

We shall consider c5 > in the region of physically meaningful parameters. 
When (5 = 0, a non- selective phase is obtained, from Eq. fll58p . if /? < 
At /3 = for = 0, a finite 'jump' in the value of from to 2/3 defines a 
phase transition, where the value of u varies continuously from to a positive 
value. 

When < a < 1 (^y^ - l) , a finite jump in the bulk magnetization from 
^c- to ^c,+ is observed. This result is in agreement with [15]. 
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Case 2: 03 = 0, a2 7^ 0. This situation occurs at the critical values 



/if 
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(^1 + 3a + 2V49 - 18a - 27&^^ , < a < 1.054444 (160) 



^(3) = A /'i + _ 2V49 - 18a - 2752) , 1 < a < 1.054444 (161) 
39 V / 

In this case, the quartic equation in z factorizes, 



z{z + aiz + 02) 







(162) 



There is a solution z = for Eq. ( 1162^ . This is however not a solution 
of Eq. (l67j) . but an artifact of introducing the algebraic transformation into 
the fourth order polynomial Eq. (11491) . 

The solutions corresponding to the remaining cubic equation in Eq. (11621) 
are analyzed as follows. Let us define the parameters, 



Sl 



S2 



0-2 



2\ 1/2' 



+ — + — 

2 V27 4 



02 



2\ 1/2' 



— + — 

27 4 



1/3 



1/3 



(163) 



Then, we have the following cases. 

Case 2. a: Consider fi = \ defined by Eq. (I160p . This situation is 
possible when < a < 1.054444. Within this range of values for a, the 

3 2 

parameter |^ + ^ > 0. Then, we find a single real solution 



^^ = 1^(t - 18a + V49 - 18a - 27a2) + si + 
39 V / 



S2 



(164) 



Case 2.b: Consider yU = /ic , defined by Eq. fll6ip . This situation is 
possible when 1 < a < 1.054444. Within this range of values for a, the 

3 2 

parameter + ^ > 0. Then, we find a single real solution 



= i- (^7 - 18a - V49 - 18a - 27a2) + si + ss 
39 V / 



(165) 
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Case 3: 03 7^ 0, 02 7^ 0. In this case, we consider again the general quartic 
Eq. (11491) . Following Ferrari's method [25], we find 4 possible roots 



ic,m-~-\P±\u (166) 



Here, we defined 



(J 



Q 



3 _ p2 _ C I 1(aBC _ oD 



A3 J-^ ' 



3B£ 
4 A2 



2f + 2 



^Ai 



P = 



(168) 



4A2 4'^^A2 ^A A^i)^ 5 7^ U 

f/=<^ / , (169) 

!*-2f-2v/^r^, P = 

From Eq. (I166p . the largest real root corresponds to the physical solution of 
Eq. dSB. 

The parameter yi in Eqs. (I167H1691) is obtained as the real root of the 
auxiliary cubic equation 

1/^ + 721/' + 7iZ/ + 70 = (170) 
Here, we defined the parameters 



-4 





C 


72 = 








7i = 








7o = 
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Let us define 



7i _ 72 
^ 3 9 

r = -(7i72 - 37o) - — 

A = g^ + r^ (172) 

We have three possible cases: A > 0, A = 0, and A < 0. 

Case 3. a: A > 0. In this case, we have one real root yi for the auxiliary 
cubic Eq. (11701) . and two complex roots. The real root to be used in Eqs. 
(I166H169P is given by 

yi = {r + A^/2)i/3 + _ ^1/2)1/3 _ J2 ^^^3^ 



Case 3.b: A = 0. In this case, all roots of the auxiliary cubic Eq. 
are real, with two of them identical, and given by 

y2 = yz = -r"^ (174) 
In this case, we take the root yi in Eq. (11741) . to be used in the formulas Eqs. 

Case 3.c: A < 0. In this case, all three roots of the auxiliary cubic Eq. 
(11701) are real and different. 

y, = 2(r2 - A)i/^os(e/3) - I 



ys 

Here, 9 = tan^^ ^^^^ '^^^^ used in Eqs. (I166H169]) . 



1/2 = -2(r2- A)i/6cos(e/3 + 7r/3) ^ 

-2(r'-A)^/^cos(^/3-7r/3)-^ (175) 



Appendix 3 

The same arguments based on the homogeneous property of Amax and appli- 
cation of Euler's theorem can be repeated for the case of the Eigen model, to 
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obtain a solution for the non-symmetric case, starting from Eq. fllOSp . Now, 
the matrix to consider is 

h h-l 
a^f3=l a=l 

Since M is clearly a homogeneous function of degree k = 1 in the vector 
{xl,xl, . . . ,x^~^, {r]"^}), Amax IS homogeucous of degree = 1 as well (the 
proof is identical as in Property I, Appendix 1 for the parallel model). There- 
fore, after Euler's theorem (see Appendix 1), we have the identity 

h-l h 

After the saddle-point Eqs. fll03l) - fll04p . we obtain the same components for 
the eigenvalue ^max as in the parallel case, Eq. fll46p . 



x^, 1 < a < h - 1 



I-ELJ^^c, a = h 



(178) 



Thus, by applying Property III (Appendix 1) for the Eigen model, we 
obtain 



'Ic \ ' / i/max ' ymax 



K (l - E'Zl x2)+l-x^, f3 = h,ay^h (179) 
y (l - E^=I + E^=J a;?, a = h, (3 y^h 

Thus, the mean fitness for the /i-states Eigen model under arbitrary mu- 
tation scheme is given by the expression 

U = max (/(x^,x2,...,x;;-^)e^'^'^-i^"''[v^-?] 

xgE^il f^.H[^^?{i-j:';zi x2)-x-]+j:';zif^Hpi^{i-T.';zl x2)xUj:';zi x2-i] 

-d{xl,xl,...,x'r')] (180) 
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